Compression-Based Averaging of Selective Naive Bayes Classifiers

نویسنده

  • Marc Boullé
چکیده

The naive Bayes classifier has proved to be very effective on many real data applications. Its performance usually benefits from an accurate estimation of univariate conditional probabilities and from variable selection. However, although variable selection is a desirable feature, it is prone to overfitting. In this paper, we introduce a Bayesian regularization technique to select the most probable subset of variables compliant with the naive Bayes assumption. We also study the limits of Bayesian model averaging in the case of the naive Bayes assumption and introduce a new weighting scheme based on the ability of the models to conditionally compress the class labels. The weighting scheme on the models reduces to a weighting scheme on the variables, and finally results in a naive Bayes classifier with “soft variable selection”. Extensive experiments show that the compressionbased averaged classifier outperforms the Bayesian model averaging scheme.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bayesian Model Averaging for Improving Performance of the Naïve Bayes Classifier

Feature selection has proved to be an effective way to reduce the model complexity while giving a relatively desirable accuracy, especially, when data is scarce or the acquisition of some feature is expensive. However, the single selected model may not always generalize well for unseen test data whereas other models may perform better. Bayesian Model Averaging (BMA) is a widely used approach to...

متن کامل

An Improved Predictive Accuracy Bound for Averaging Classifiers

We present an improved bound on the difference between training and test errors for voting classifiers. This improved averaging bound provides a theoretical justification for popular averaging techniques such as Bayesian classification, Maximum Entropy discrimination, Winnow and Bayes point machines and has implications for learning algorithm design.

متن کامل

Learning Selectively Conditioned Forest Structures with Applications to DNBs and Classification

Dealing with uncertainty in Bayesian Network structures using maximum a posteriori (MAP) estimation or Bayesian Model Averaging (BMA) is often intractable due to the superexponential number of possible directed, acyclic graphs. When the prior is decomposable, two classes of graphs where efficient learning can take place are treestructures, and fixed-orderings with limited in-degree. We show how...

متن کامل

Learning Selectively Conditioned Forest Structures with Applications to DBNs and Classification

Dealing with uncertainty in Bayesian Network structures using maximum a posteriori (MAP) estimation or Bayesian Model Averaging (BMA) is often intractable due to the superexponential number of possible directed, acyclic graphs. When the prior is decomposable, two classes of graphs where efficient learning can take place are treestructures, and fixed-orderings with limited in-degree. We show how...

متن کامل

Credal Classification based on AODE and compression coefficients

Bayesian model averaging (BMA) is a common approach to average over alternative models; yet, it usually gets excessively concentrated around the single most probable model, therefore achieving only sub-optimal classification performance. The compression-based approach (Boullé, 2007) overcomes this problem; it averages over the different models by applying a logarithmic smoothing over the models...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of Machine Learning Research

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2007